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. . . Abstract 

(^ ' Given a limited number of entries from the superposition of a low-rank matrix plus the product of a 

^ ^ ' known fat compression matrix times a sparse matrix, recovery of the low-rank and sparse components is 

C^ , a fundamental task subsuming compressed sensing, matrix completion, and principal components pursuit. 
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This paper develops algorithms for distributed sparsity-regularized rank minimization over networks, when 
the nuclear- and ^i-norm are used as surrogates to the rank and nonzero entry counts of the sought 
matrices, respectively. While nuclear-norm minimization has well-documented merits when centralized 
processing is viable, non-separability of the singular-value sum challenges its distributed minimization. 
To overcome this limitation, an alternative characterization of the nuclear norm is adopted which leads to 
O I a separable, yet non-convex cost minimized via the alternating-direction method of multipliers. The novel 

distributed iterations entail reduced-complexity per-node tasks, and affordable message passing among 

^ . single-hop neighbors. Interestingly, upon convergence the distributed (non-convex) estimator provably 

O ■ 

r~^ ■ attains the global optimum of its centralized counterpart, regardless of initialization. Several application 
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domains are outlined to highlight the generality and impact of the proposed framework. These include 



^>f-^ ' unveiling traffic anomalies in backbone networks, predicting networkwide path latencies, and mapping the 

o' 



RF ambiance using wireless cognitive radios. Simulations with synthetic and real network data corroborate 
the convergence of the novel distributed algorithm, and its centralized performance guarantees. 
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I. Introduction 

Let X := [xi^t] G M^^^ be a low-rank matrix [rank(X) < min(L,r)], and A := [aj^t] G M^^"^ be a 
sparse matrix with support size considerably smaller than FT. Consider also a matrix R := [r; j] € M^^^ 
and a set r2 C {1, . . . , L} X {1, . . . , T} of index pairs (/, t) that define a sampling of the entries of X. 
Given R and a number of (possibly) noise corrupted measurements 

F 
yi,t = xi^t + '^rijaf^t + vi^t, (/,*)€ (1) 

the goal is to estimate low-rank X and sparse A, by denoising the observed entries and imputing the 
missing ones. Introducing the sampling operator 'Pq{-) which sets the entries of its matrix argument not 
in to zero and leaves the rest unchanged, the data model can be compactly written in matrix form as 

Vn{Y)=Vn{^ + nA + V). (2) 

A natural estimator accounting for the low rank of X and the sparsity of A will be sought to fit the 
data Vq{Y) in the least-squares (LS) error sense, as well as minimize the rank of X, and the number of 
nonzero entries of A measured by its £o-(pseudo) norm; see e.g. ||9], lilTI . ||8], |[T2l for related problems 
subsumed by the one described here. Unfortunately, both rank and ^Q-norm minimization are in general 
NP-hard problems |[T3l . ll24l . Typically, the nuclear norm ||X||* := ^^k^kP^) (ca;(X) denotes the k-th 
singular value of X) and the £i-norm ||A||i := J2ft \^f,t\ ^r^ adopted as surrogates, since they are the 
closest convex approximants to rank(X) and ||A||o, respectively lITTI . |[25l . |[32l . Accordingly, one solves 

(PI) min ^||7'n(Y-X-RA)||2, + A*||X||, + Ai||A||i 

{X,A} 2 

where A*, Ai > are rank- and sparsity-controUing parameters. Being convex (PI) is appealing, and some 

of its special instances are known to attain good performance in theory and practice. For instance, when 

no data are missing (PI) can be used to unveil traffic anomalies in networks jITI . Results in f2\\ show 

that X and A can be exactly recovered in the absence of noise, even when R is a fat (compression) 

operator. When R equals the identity matrix, (PI) reduces to the so-termed robust principal component 

analysis (PC A), for which exact recovery results are available in |[8l and |[T2l . Moreover, for the special 

case R = OlxF, (PI) offers a low -rank matrix completion alternative with well-documented merits; see 

e.g., ifTOl and |l9l- Stable recovery results in the presence of noise are also available for matrix completion 

and robust PCA ^, |[35l . Earlier efforts dealing with the recovery of sparse vectors in noise led to similar 

performance guarantees; see e.g., |]6]. 

In all these works, the samples ^^^(Y) and matrix R are assumed centrally available, so that they can 

be jointly processed to estimate X and A by e.g., solving (PI). Collecting all this information can be 
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challenging though in various applications of interest, or may be even impossible in e.g., wireless sensor 
networks (WSNs) operating under stringent power budget constraints. In other cases such as the Internet or 
collaborative marketing studies, agents providing private data for e.g., fitting a low-rank preference model, 
may not be willing to share their training data but only the learning results. Performing the optimization 
in a centralized fashion raises robustness concerns as well, since the central processor represents an 
isolated point of failure. Scalability is yet another driving force motivating distributed solutions. Several 
customized iterative algorithms have been proposed to solve instances of (PI), and have been shown 
effective in tackling low- to medium-size problems; see e.g., |[2ll . ifTOl . ll25ll . However, most algorithms 
require computation of singular values per iteration and become prohibitively expensive when dealing 
with high-dimensional data |f261. All in all, the aforementioned reasons motivate the reduced-complexity 
distributed algorithm for nuclear and ^i-norm minimization developed in this paper. 

In a similar vein, stochastic gradient algorithms were recently developed for large-scale problems 
entailing regularization with the nuclear norm [|26l . Even though iterations in ll26l are highly paralellizable, 
they are not applicable to networks of arbitrary topology. There are also several studies on distributed 
estimation of sparse signals via ^i-norm regularized regression; see e.g., [|l4l . lUTI . [|22l . Different from 
the treatment here, the data model of ||22| is devoid of a low-rank component and all the observations Y 
are assumed available (but distributed across several interconnected agents). Formally, the model therein 
is a special case of ([2]) with T = 1, X = Ol-^t, and = {1, . . . , L} x {1, . . . , T}, in which case (PI) 
boils down to finding the least-absolute shrinkage and selection operator (Lasso) IIBTI . 

Building on the general model © and the centralized estimator (PI), this paper develops decentralized 
algorithms to estimate low-rank and sparse matrices, based on in-network processing of a small subset of 
noise-corrupted and spatially-distributed measurements (Section Hill) . This is a challenging task however, 
since the non-separable nuclear-norm present in (PI) is not amenable to distributed minimization. To 
overcome this hmitation, an alternative characterization of the nuclear norm is adopted in Section IIII-AI 
which leads to a separable yet non-convex cost that is minimized via the alternating-direction method 
of multipliers (AD-MoM) [Si. The novel distributed iterations entail reduced-complexity optimization 
subtasks per agent, and affordable message passing only between single-hop neighbors (Section IIII-CI ). 
Interestingly, the distributed (non-convex) estimator upon convergence provably attains the global optimum 
of its centralized counterpart (PI), regardless of initialization. To demonstrate the generality of the proposed 
estimator and its algorithmic framework, four networking- related application domains are outlined in 
Section HVl namely: i) unveiling traffic volume anomalies for large-scale networks ||T9l . lIlTI ; ii) robust 
PCA IHl, |[T2l ; iii) low-rank matrix completion for networkwide path latency prediction [,20,| . and iv) 
spectrum sensing for cognitive radio (CR) networks |@], ||22| . Numerical tests with synthetic and real 
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network data drawn from these application domains corroborate the effectiveness and convergence of the 
novel distributed algorithms, as well as their centralized performance benchmarks (Section |V}. 
Section |Vl] concludes the paper, while several technical details are deferred to the Appendix. 
Notation: Bold uppercase (lowercase) letters will denote matrices (column vectors), and calligraphic letters 
will be used for sets. Operators (•)', tr(-), (Tniax(')' © ^'^'i '^^ will denote transposition, matrix trace, 
maximum singular value, Hadamard product, and Kronecker product, respectively; | • | will be used for 
the cardinality of a set, and the magnitude of a scalar. The matrix vectorization operator vec(Z) stacks 
the columns of matrix Z on top of each other to return a supervector, and its inverse is unvec(z). The 
diagonal matrix diag(v) has the entries of v on its diagonal, and the positive semidefinite matrix M 
will be denoted by M ^ 0. The Ip-noxm of x G M" is ||x||p := {YJl=i \xi\^Y/^ for p > I. For two 
matrices M, U € M"^", (M, U) := tr(M'U) denotes their trace inner product. The Frobenious noiTn of 
matrix M = [mjj] G M"^p is \\M.\\f := ^^\iL{MM!), ||M|| := max||x||2=i ||Mx||2 is the spectral norm, 
||M||i := ^^ • |?TT.i.j| is the £i-norm, ||M||oo := maxjj |r?T,jj| is the £00 -norm, and ||M||* := ^^^i^ii^^) i^ 
the nuclear norm, where (Ti(M) denotes the i-th singular value of M. The n x n identity matrix will be 
represented by I„, while 0„ will stand for n x 1 vector of all zeros, and Onxp '■= 0^0' Similar notations 
will be adopted for vectors (matrices) of all ones. 

II. Preliminaries and Problem Statement 

Consider N networked agents capable of performing some local computations, as well as exchanging 
messages among directly connected neighbors. An agent should be understood as an abstract entity, e.g., 
a sensor in a WSN, a router monitoring Internet traffic; or a sensing CR from a next-generation commu- 
nications technology. The network is modeled as an undirected graph G{M,C), where the set of nodes 
TV^ := {1, . . . , N} corresponds to the network agents, and the edges (links) in C := {1, . . . , L} represent 
pairs of agents that can communicate. Agent n ^ J\f communicates with its single-hop neighboring peers 
in J7n^ and the size of the neighborhood will be henceforth denoted by |j7n|. To ensure that the data from 
an arbitrary agent can eventually percolate through the entire network, it is assumed that: 

(al) Graph G is connected; i.e., there exists a (possibly) multi-hop path connecting any two agents. 

With reference to the low-rank and sparse matrix recovery problem outlined in Section Jl in the network 
setting envisioned here each agent n ^ J\f acquires a few incomplete and noise-corrupted rows of matrix Y. 
Specifically, the local data available to agent n is matrix Vn^O^n), where Y„ G M^"^^, X]n=i ^n = L, 
and Y := [Y'^, . . . , Y^] = X + RA + V. The index pairs in Qn are those in i7 for which the row 
index matches the rows of Y observed by agent n. Additionally, suppose that agent n has available the 
local matrix R„ G R^"^^, containing a row subset of R associated with the observed rows in Y,j, i.e. 
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R := [R'^, . . . , R^] . Agents collaborate to form the wanted estimator (PI) in a distributed fashion, which 
can be equivalently rewritten as 

N 



mill y 
{X,A} ^ 



IwVnA^n - X„ - R„A)||| + ^||X||, + ^||A||i 



n=l L 

The objective of this paper is to develop a distributed algorithm for sparsity-regularized rank minimization 
via (PI), based on in-network processing of the locally available data. The described setup naturally 
suggests three characteristics that the algorithm should exhibit: cl) agent n ^ J\f should obtain an estimate 
of X„ and A, which coincides with the corresponding solution of the centralized estimator (PI) that uses 
the entire data 'Pf7(Y); c2) processing per agent should be kept as simple as possible; and c3) the overhead 
for inter-agent communications should be affordable and confined to single -hop neighborhoods. 

III. Distributed Algorithm for In-Network Operation 

To facilitate reducing the computational complexity and memory storage requirements of the distributed 
algorithm sought, it is henceforth assumed that: 

(a2) An upper bound p > rank{%) on the rank of matrix X obtained via (PI) is available. 
As argued next, the smaller the value of p, the more efficient the algorithm becomes. A small value of p 
is well motivated in various applications. For example, the Internet traffic analysis of backbone networks 
in |[T9l demonstrates that origin-to-destination flows have a very low intrinsic dimensionality, which renders 
the traffic matrix low rank; see also Section IIV-AI In addition, recall that rank(X) is controlled by the 
choice of A* in (PI), and the rank of the solution can be made small enough, for sufficiently large A*. 
Because rank(X) < p, (Pl)'s search space is effectively reduced and one can factorize the decision variable 
as X = LQ', where L and Q are L x p and T x p matrices, respectively. Adopting this reparametrization 
of X in (PI), one obtains the following equivalent optimization problem 



N 

(P2) min y 

{L,Q,A}^^ 



\\\Vn„{Yn - L„Q' - R„A)|||. + ^||LQ'||* + ^||A||i 



n=l 



which is non-convex due to the bilinear terms L„Q', and L := [L'^^, . . . , L'^] . The number of variables is 
reduced from LT + FT in (PI), to p{L + T) + FT in (P2). The savings can be significant when p is in the 
order of a few dozens, and both L and T are large. The dominant FT-term in the variable count of (P3) 
is due to A, which is sparse and can be efficiently handled even when both F and T are large. Problem 
(P3) is still not amenable to distributed implementation due to: (i) the non-separable nuclear norm present 
in the cost function; and (ii) the global variables Q and A coupling the per-agent summands. 
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A. A separable nuclear norm regularization 

To address (i), consider the following neat characterization of the nuclear norm 

||X||, := min ^ {||L|||. + ||Q||U , s. to X = LQ'. (3) 

{L,Q} 2 ^ ' 

For an arbitrary matrix X with SVD X = UxSxV'y, the minimum in ([3]l is attained for L = Uj^Sj^ 
and Q = Wx'^x ■ ^'^^ optimization Q is over all possible bilinear factorizations of X, so that the 
number of columns of L and Q is also a variable. Leveraging Q, the following reformulation of (P2) 
provides an important first step towards obtaining a distributed estimator: 

^ n A A 

As asserted in the following lemma, adopting the separable Frobenius-norm regularization in (P3) comes 
with no loss of optimality, provided the upper bound p is chosen large enough. 

Lemma 1: Under (a2), (P3) is equivalent to (PI). 

Proof: Let {X, A} denote the minimizer of (PI). Clearly, rank(X) < p, implies that (P2) is equivalent 
to (PI). From Q one can also infer that for every feasible solution {L,Q,A}, the cost in (P3) is no 
smaller than that of (P2). The gap between the globally minimum costs of (P2) and (P3) vanishes at 
A := A, L := US^/^^ and Q := VS^/^^ where X = USV'. Therefore, the cost functions of (PI) and 
(P3) are identical at the minimum. ■ 

Lemma [U ensures that by finding the global minimum of (P3) [which could have significantly less 
variables than (PI)], one can recover the optimal solution of (PI). However, since (P3) is non-convex, it 
may have stationary points which need not be globally optimum. Interestingly, the next proposition shows 
that under relatively mild assumptions on rank(X) and the noise variance, every stationary point of (P3) 
is globally optimum for (PI). For a proof, see Appendix A. 

Proposition 1: Let {L, Q,A} be a stationary point of (P3). //■ HPq (Y — LQ' - RA) || < A^, (no subscript 
in \\.\\ signifies spectral norm), then {X = LQ', A = A} is the globally optimal solution of (PI). 
Condition ||Pq(Y — LQ' — RA)|| < A* captures tacitly the role of p, the number of columns of L and 
Q in the postulated model. In particular, for sufficiently small p the residual ||'Pf7(Y — LQ' — RA)| 
becomes large and consequently the condition is violated (unless A* is large enough, in which case a 
sufficiently low-rank solution to (PI) is expected). This is manifested through the fact that the extra 
condition rank(X) < p in Lemma [T] is no longer needed. In addition, note that the noise variance certainly 
affects the value of ||'Pq(Y — LQ' — RA)||, and thus satisfaction of the said condition. 
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B. Local variables and consensus constraints 



To decompose the cost function in (P3), in which summands are coupled through the global variables Q 
and A [cf. (ii) at the beginning of this section], introduce auxiliary variables {Q„, A„}^^]^ representing 
local estimates of {Q, A} per agent n. These local estimates are utilized to form the separable constrained 
minimization problem 

N 
(P4) min 



{L,Q„,A„,B„}^^ 



\\\VnSYn - L„Q; - R„B„)||2, + A {iV||Lj|| + ||QJ|2,} + ^||A„||i 
s. to B„ = A„, n e M 

^n ^ ^m; A„. ^ ^m-i f^ S Urn Tl kz J\l . 

For reasons that will become clear later on, additional variables {B„}^^^ were introduced to split the 
£2— norm fitting-error part of the cost of (P4), from the £1— norm regularization on the {A„}^^^ (cf. 
Remark[3ll. These extra variables are not needed if R'R = Ip- The set of additional constraints B„ = A„ 
ensures that, in this sense, nothing changes in going from (P3) to (P4). Most importantly, (P3) and (P4) 
are equivalent optimization problems under (al). The equivalence should be understood in the sense that 
Qi = Q2 = ■ ■ ■ = Qat = Q and likewise for A, where {Q„,A„}„g^ and {Q,A} are the optimal 
solutions of (P4) and (P3), respectively. Of course, the corresponding estimates of L will coincide as well. 
Even though consensus is a fortiori imposed within neighborhoods, it extends to the whole (connected) 
network and local estimates agree on the global solution of (P3). To arrive at the desired distributed 
algorithm, it is convenient to reparametrize the consensus constraints in (P4) as 

Q„ = F;^, Q„ = F™, and F;^ = F;^, mej-n, neAT (4) 

A„ = G;;\ A™ = G;^, and G™ = G^, m € J^, neM (5) 

where {F™,F5^, G™, G^}™^^" are auxiliary optimization variables that will be eventually eliminated. 

C. The alternating-direction method of multipliers 

To tackle the constrained minimization problem (P4), associate Lagrange multipliers M„ with the 
splitting constraints B„ = A„, ?i G Af. Likewise, associate additional dual variables C^ and C^ (D™ and 
D^) with the first pair of consensus constraints in (|4) [respectively ©J. Next introduce the quadratically 
augmented Lagrangian function 

^ n A A 



n=l 
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N N 

+ ^{Mn, Bn-An) + ^Y1 ||B„ - AJ^ 
n=l n=l 

N 



n=l m&J„ 

N 

_i_ '' \ ^ \ ^ Jim "p"^i|2 _i_ im "^"^112 _i_iiA r""i|2_i_iiA r""i|2\ (f.\ 

n=l m&Jn 

where c is a positive penalty coefficient, and the primal variables are split into three groups Vi := 
{Q„, A„}^=i, V2 := {L„};^=i, and V3 := {B„„F-,F-, G™, G™}™|/". For notational convenience, 
collect all multipliers in M := {M„, C™, C^-, '^n-, ^n }^g;^"- Note that the remaining constraints in dD 
and dS]), namely Cy ■= {F™ = F^, G™ = G™, m £ Jn, n e TV}, have not been dualized. 

To minimize (P4) in a distributed fashion, a variation of the alternating-direction method of multipliers 
(AD-MoM) will be adopted here. The AD-MoM is an iterative augmented Lagrangian method especially 
well-suited for parallel processing ||5], which has been proven successful to tackle the optimization tasks 
encountered e.g., with distributed estimation problems ll22l . ll28l . The proposed solver entails an iterative 
procedure comprising four steps per iteration k = 1,2, . . . 

[SI] Update dual variables: 

Mn[k]=Mn[k-l]+fl{Bn[k]-An[k]), Tl^M (7) 

C™[A;] = C;r[A: - 1] + /x(Q„[fc] - F;r[fc]), n G AA, m G J^ (8) 

C;r[^] = ^n[k-l\+ ^^{Qm[k] - K[k]), n(^N,m(^Jn (9) 

D™[fc]=D™[fc-l]+/i(A„,[A;]-G™[A;]), n G AA, m G J„ (10) 

t>^[k]=t^^[k-l]+^^{Am[k]-G^[k]), n£M,meJn (11) 
[S2] Update first group of primal variables: 

Vi[A; + l] =arg mmC,{Vi,V2[k],V'i[k],M[k]) . (12) 

Vi 

[S3] Update second group of primal variables: 

V2[k + 1] = arg minC,{Vi[k + l],V2,V^[k],M[k]) . (13) 

V2 

[S4] Update auxiliary primal variables: 

V3[A; + l]=arg min Cc{Vi[k + l],V2[k + l],V^,M[k]) . (14) 

VsGCv 

This four-step procedure implements a block-coordinate descent method with dual variable updates. At 
each step while minimizing the augmented Lagrangian, the variables not being updated are treated as 
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Algorithm 1 : AD-MoM solver per agent n € M 



input Y„, R„, A,, Ai, c,/i 

initialize M„[0] = P„[0] = A„[l] = B„[l] = Ofxt, 0[0] = Orxp, and L„[l], Q„[l] at random 

for k = 1,2,... do 

Receive {Qm[fc], Am[fc]} from neighbors m G J'n 

[SI] Update local dual variables: 

M„[fc] = Mn[k - 1] + m(B„[A;] - A„[fc]) 

0„[fc] =On[k-l]+ /iE,„eJ„(Qn[fc] - Qm[k]) 
Pn[k] = P„[fc - 1] +flE,neJ„ (An[fc] ^ A,n[k]) 

[S2] Update first group of local primal variables: 



Q„[fc + 1] = argminq r„(L„[fc], Q, B„[fc]) + ^||Q||| + (0„[A:], Q) + c^. 



Q_ Q,Jfc]+Q„[fc] ^ 



A„[fc + 1] = [c(l + 2| J,:|)]-15a,/w (M„[fc] + cB„[fc] - P„[fc] +cJ2^^jJA4k] + A^[k])) 
[S3] Update second group of local primal variables: 
L„[fc + 1] = argmiHL {r„(L, Q„[fc + l],B„[fc]) + ^|1L1||} 
[S4] Update auxiliary local primal variables: 

B„[fc + 1] = argmine {r„(L„[fc + 1], Q„[fc + 1], B) + (M„[fc],B) + f 1|B - A„[fc + 1]||2,} 

Broadcast {Qn[k + 1], A„[fc +1]} to neighbors m e Jn 
end for 
return A„,Q„,L„ 



fixed and are substituted with their most up to date values. Different from AD-MoM, the alternating- 
minimization step here generally cycles over three groups of primal variables V1-V3 (cf. two groups in 
AD-MoM O). In some special instances detailed in Sections IIV-CI and IIV-D[ cycling over two groups of 
variables only is sufficient. In [SI], ;U > is the step size of the subgradient ascent iterations (l7Tl- (ITTI ). 
While it is common in AD-MoM implementations to select fi = c, a. distinction between the step size and 
the penalty parameter is made explicit here in the interest of generality. 

Reformulating the estimator (PI) to its equivalent form (P4) renders the augmented Lagrangian in <^ 
highly decomposable. The separability comes in two flavors, both with respect to the variable groups Vi, 
V2, and V3, as well as across the network agents n £ M. This in turn leads to highly parallelized, simplified 
recursions corresponding to the aforementioned four steps. Specifically, it is shown in Appendix B that 
if the multipliers are initialized to zero, [S1]-[S4] constitute the distributed algorithm tabulated under 
Algorithm [T] For conciseness in presenting the algorithm, define the local residuals r„(L„,Q„,B„) := 
^||7'n„(Y„ — linQ'n ~ R-nBn)!!!,. In addition, define the soft-thresholding matrix 5t-(]V[) with {i,j)-th 
entry given by sign(r7ij j) max{|?Tij j| — t,0}, where rriij denotes the {i,j)-th entry of IVI. 
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Remark 1 (Simplification of redundant variables): Careful inspection of Algorithm [T] reveals that the 
inherently redundant auxiliary variables and multipliers {F™,F^, G^, G™, C™,D™} have been elimi- 
nated. Agent n does not need to separately keep track of all its non-redundant multipliers {C™, D^}me>7„> 
but only to update their respective (scaled) sums On[k] ■= 2 X^^gj;^ C™[A:] and P„[A;] := 2 J2mej„ ^"nif^]- 
Remark 2 (Computational and communication cost): The main computational burden of the algorithm 
stems from solving unconstrained quadratic programs locally to update Q„, L„, B„, and to carry out simple 
soft-thresholding operations to update A„. On a per iteration basis, network agents communicate their 
updated local estimates {Q,j[/c], A.„[A;]} with their neighbors, to carry out the updates of the primal and 
dual variables during the next iteration. Regarding communication cost, Qn[^] is a T x p matrix and its 
transmission does not incur significant overhead when p is small. In addition, the F xT matrix A„[/i;] is 
sparse, and can be communicated efficiently. Observe that the dual variables need not be exchanged. 
Remark 3 (General sparsity-promoting regularization): Even though Ai||A||i was adopted in (PI) to 
encourage sparsity in the entries of A, the algorithmic framework here can accommodate more general 
structured sparsity-promoting penalties ip{A). To maintain the per-agent computational complexity at 
affordable levels, the minimum requirement on the admissible penalties is that the proximal operator 



prox^(Y) := argmin 



^\\Y-A\\l + ^P{A] 



(15) 



is given in terms of vector or (and) scalar soft-thresholding operators. In addition to the ^i-norm (Lasso 
penalty), this holds for the sum of row-wise ^2-norms (group Lasso penalty ||33l ). or, a linear combination 
of the aforementioned two - the so-termed hierarchical Lasso penalty that encourages sparsity across and 
within the rows of A |[29l . All this is possible since by introducing the cost-splitting variables B„, the 
local sparse matrix updates are A„[fc + 1] = prox^(Y„[fc]) for suitable Y„[A;] (see Appendix B). 

When employed to solve non-convex problems such as (P4), AD-MoM (or its variant used here) offers 
no convergence guarantees. However, there is ample experimental evidence in the literature which supports 
convergence of AD-MoM, especially when the non-convex problem at hand exhibits "favorable" structure. 
For instance, (P4) is bi-convex and gives rise to the strictly convex optimization subproblems (IT2l)-(IT4]i. 
which admit unique closed-form solutions per iteration. This obsei^vation and the linearity of the constraints 
endow Algorithm [1] with good convergence properties - extensive numerical tests including those presented 
in Section |V] demonstrate that this is indeed the case. While a formal convergence proof goes beyond 
the scope of this paper, the following proposition proved in Appendix C asserts that upon convergence. 
Algorithm [T] attains consensus and global optimality. 

Proposition 2: If the sequence of iterates {Q„[/c],L„[fc], A„[fc]}„g7^ generated by AlgorithmU] converge 

to {Qn,Ln) AnjneA^. '^wc? (al) holds, then: i) Q„ = Qm, A„ = A^, n, m € Af; and ii) if [[^^(Y — 
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LQ^ — RAi)|| < A*, then X = LQ'^ and A = Ai, where {A,X} is the global optimum of (PI). 

IV. Applications 

This section outlines a few applications that could benefit from the distributed sparsity-regularized rank 
minimization framework described so far. In each case, the problem statement calls for estimating low- 
rank X and (or) sparse A, given distributed data adhering to an application-dependent model subsumed 
by (O. Customized algorithms are thus obtained as special cases of the general iterations in Algorithm [l] 

A. Unveiling traffic anomalies in backbone networks 

In the backbone of large-scale networks, origin-to-destination (OD) traffic flows experience abrupt 
changes which can result in congestion, and limit the quality of service provisioning of the end users. 
These so-termed traffic volume anomalies could be due to external sources such as network failures, denial 
of service attacks, or, intruders which hijack the network services ll30l |[T9l . Unveiling such anomalies is 
a crucial task towards engineering network traffic. This is a challenging task however, since the available 
data are usually high-dimensional noisy link-load measurements, which comprise the superposition of 
unobservable OD flows as explained next. 

The network is modeled as in Section III and transports a set of end-to-end flows J" (with \T\ := F) 
associated with specific source-destinations pairs. For backbone networks, the number of network layer 
flows is typically much larger than the number of physical links (F ^ L). Single-path routing is considered 
here to send the traffic flow from a source to its intended destination. Accordingly, for a particular flow 
multiple links connecting the corresponding source-destination pair are chosen to carry the traffic. Sparing 
details that can be found in |2T1, the traffic Y := [y; j] G M^^^ carried over links / € C and measured at 
time instants t G [1,^] can be compactly expressed as 

Y = R(Z + A) + V (16) 

where the fat routing matrix R := [rij] G {0, 1}^^^ is fixed and given, Z := [zj^t] denotes the unknown 
"clean" traffic flows over the time horizon of interest, and A := [aj j] collects the traffic volume anomalies. 
These data are distributed. Agent n acquires a few rows of Y corresponding to the link-load traffic 
measurements Y,i € M^"^^ from its outgoing links, and has available its local routing table R„ which 
indicates the OD flows routed through n. Assuming a suitable ordering of links, the per agent quantities 
relate to their global counterparts in ([16} through Y := [Y^, . . . , Y^] and R := [R'^, . . . , R^] . 

Common temporal patterns among the traffic flows in addition to their periodic behavior, render most 
rows (respectively columns) of Z linearly dependent, and thus Z typically has low rank |[T9l . Anomalies 
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are expected to occur sporadically over time, and only last for short periods relative to the (possibly long) 
measurement interval [1, T]. In addition, only a small fraction of the flows are supposed to be anomalous 
at any given time instant. This renders the anomaly matrix A sparse across rows and columns. Given 
local measurements {Yn}neAf and the routing tables {R„}„g;\/, the goal is to estimate A in a distributed 
fashion, by capitalizing on the sparsity of A and the low-rank property of Z. Since the primary goal is 
to recover A, define X := RZ which inherits the low-rank property from Z, and consider [cf. (IT6l )1 

Y = X + RA + V. (17) 

Model (fTTl) is a special case of Q, when all the entries of Y are observed, i.e., Q = {1, . . . ,L} x 
{1, . . . , T}. Note that RA is not sparse even though A is itself sparse, hence principal components pursuit 
is not applicable here ||35l . Instead, the following estimator is adopted to unveil network anomalies ||2TI 

1 II T-« » Il2 ^* II-IAII '^1 



{X, A) = arg min > 



— Y„, — X„ — R„A p -\ — --r X L H — TT A 



2 II -„, .-„ -~,.--ii^ , jv" " N' 

which is subsumed by (PI). Accordingly, a distributed algorithm can be readily obtained by simplifying 
the general iterations under Algorithm [T] the subject dealt with next. 

Distributed Algorithm for Unveiling Network Anomalies (DUNA). For the specific case here in which 
Q = {1,...,L} X {1,...,T}, the residuals in Algorithm [T] reduce to r.„(L„, Q„,B„) := ^||Y„ — 
LnQn ~ R-nBnlll,. Accordingly, to update the primal variables Qn[k + 1], L„[A; + 1] and B„[fc + 1] 
as per Algorithm [T] one needs to solve respective unconstrained strictly convex quadratic optimization 
problems. These admit closed-form solutions detailed under Algorithm |2] The DUNA updates of the local 
anomaly matrices A„[A; + 1] are given in terms of soft-thresholding operations, exactly as in Algorithm [U 
Conceivably, the number of flows F can be quite large, thus inverting the F x F matrix R^Rn + dp 
to update B„[A; + 1] could be complex computationally. Fortunately, the inversion needs to be carried 
out once, and can be performed and cached off-line. In addition, to reduce the inversion cost, the 
SVD of the local routing matrices Rn = U/j^S/j^V^ can be obtained first, and the matrix inversion 
lemma can be subsequently employed to obtain [R^R„ + cl^]"^ = (1/c) [lp~^if„CV^ ], where 
C := diag ( ^^rp') •••) '^^^ ) ^^d p = rank(R„) ^ F. This computational shortcut is commonly adopted 
in statistical learning algorithms when ridge regression estimates are sought, and the number of variables 
is much larger than the number of elements in the training set |[T5l Ch. 18]. During the operational phase 
of the algorithm, the main computational burden of DUNA comes from repeated inversions of (small) 
p X p matrices, and parallel soft-thresholding operations. The communication overhead is identical to the 
one incurred by Algorithm [T] (cf. Remark |2). 

Remark 4 (Incomplete link traffic measurements): In general, one can allow for missing traffic data 
and the DUNA updates are still expressible in closed form. 
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Algorithm 2 : DUNA per agent n £ M 



input Y„, R„, A,, Ai, c,/i 

initialize M„[0] = P„[0] = A„[l] = B„[l] = Ofxt, 0[0] = Orxp, and L„[l], Q„[l] at random 

for k = 1,2,... do 

Receive {Qm[fc], Am[fc]} from neighbors m G J'n 

[SI] Update local dual variables: 

M„[fc] = Mn[k - 1] + m(B„[A;] - A„[fc]) 

Pn[k] = P„[fc - 1] +fiE,neJ„ (An[fc] ^ A„[A:]) 
[S2] Update first group of local primal variables: 

Q„[fc+1] = {Y'„L4k]-B'Jk]R'„L4k] ~ 04k]+cj:^^jJQn[k] + Qm[k])} [K[k]I^n[k] + (A./iV + 2c|J„|)Ip]-' 

An[k + 1] = [c(l + 2\Jn\)]-^Sx,/N {Mn[k] + cB„[fc] - Pn[k]+cJ2mejJ^n[k] + A^[k])) 

[S3] Update second group of local primal variables: 

L„[fc + 1] = (Y„ - R„B„[fc]) Q„[fc + 1] [QJJfc + l]Q„[fc + 1] + A,Ip]"^ 
[S4] Update auxiliary local primal variables: 

B„[fc + 1] = [R;R„ + clp]-^ {R;(Y„ - L„[fc + l]Q;[fc + 1]) - M„[fc] + cA„[fc + 1]} 

Broadcast {Qn[k + 1], A„[fc +1]} to neighbors m G Jn 
end for 
return A„,Q„,L„ 



B. In-network robust principal component analysis 

Principal component analysis (PCA) is the workhorse of high-dimensional data analysis and dimension- 
ality reduction, with numerous applications in statistics, networking, engineering, and the biobehavioral 
sciences; see, e.g., ifTSl . Nowadays ubiquitous e-commerce sites, complex networks such as the Web, and 
urban traffic surveillance systems generate massive volumes of data. As a result, extracting the most in- 
formative, yet low-dimensional structure from high-dimensional datasets is of paramount importance ||T5l . 

Data obeying postulated low-rank models include also outliers, which are samples not adhering to those 
nominal models. Unfortunately, similar to LS estimates PCA is very sensitive to the outliers lITSl . While 
robust approaches to PCA are available, recently polynomial-time algorithms with remarkable performance 
guarantees have emerged for low-rank matrix recovery in the presence of sparse - but otherwise arbitrarily 
large - errors |18], llT2l . [|35l . Robust PCA is of great interest in networking-related applications. One 
can think of distributed estimation using reduced-dimensionality sensor observations ||27| . and unveiling 
anomalous flows in backbone networks from Netflow data ||2] ; see also Section |V-B[ 
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In the network setting of Section JIJ each agent n ^ M acquires En outlier-plus-noise corrupted rows 
of matrix Y := [Y^, . . . , Y^]', where X]n=i ^n — ^- Local data can thus be modeled as Y„ = X„ + 
A„ + V,„, where X := [X'j^, . . . ,X'^]' has low rank. Agents want to estimate X,„ (and the outliers A„) 
in a distributed fashion by forming the global estimator ll35Tl 

N 

I „ A.. 

(18) 



|X, A| = are; min > 



ll"V" Y A l|2 ^ * llYll _L \ II A II 



which is once more a special case of (PI) when R = li;'. 

Distributed Robust Principal Component Analysis (DRPCA) Algorithm. Regarding the general dis- 
tributed formulation in (P4), the first constraint is no longer needed since R = li? [cf. the discussion after 
(P4)]. As agent n is interested in estimating A„ and ||A||i is separable over the rows of A, the only 
required constraints are Q„ = Qm, m ^ Jn, n £ N. These are associated with the dual variables 0„ 
per agent, and are updated according to Algorithm |3] All in all, each agent stores and recursively updates 
the primal variables {Q„,L„}, along with the F„ x T matrix A„. 

Mimicking the procedure that led to Algorithm [1] one finds that primal variable updates in DRPCA are 
expressible in closed form. In particular, the local outlier matrix A„ [k + 1] minimizes the Lasso cost 

An[k + 1] = arg min | -||Y„ - L„[fc + 1]Q'„[A; + 1] - A„|||, + Ai||A„|[i I 

{A„} [2 J 

and is given in terms of soft-thresholding operations as seen in Algorithm |3] [observe that A„[A; + 1] = 
proxiMi (Y„ — L„[A; + IJQ^JA; + 1]), where prox^,(-) is defined in (fTSlll. DRPCA iterations are simple 
with small p x p matrices inverted per iteration to update L„ and Q„ (see Algorithm |3]l. Regarding 
communication cost, each agent only broadcasts a T x p matrix Q„ to its neighbors. 

C. Distributed low-rank matrix completion 

The ability to recover a low-rank matrix from a subset of its entries is the leitmotif of recent advances 
for localization of wireless sensors |j23], Internet traffic analysis ||20l . ||34l . and preference modeling for 
recommender systems |l3]. In the low-rank matrix completion problem, given a limited number of (possibly) 
noise corrupted entries of a low-rank matrix X, the goal is to recover the entire matrix while denoising 
the observed entries, and accurately imputing the missing ones. 

In the network setting envisioned here, agent n € A/" has available L„ incomplete and noise-corrupted 
rows of Y := [Y'j^, . . . , Y^]'. Local data can thus be modeled as 'Pf7„(Y„) = 'Pq^(X„ + V„). Relying 
on in-network processing, agents aim at completing their own rows by forming the global estimator 



X = arg min >^ 

n=i ■- 



^ ^1 A 

-rn„(Y„-X„)||| + ^||X| 



(19) 
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Algorithm 3 : DRPCA algorithm per agent n G Af 



input Y„, A*, Ai,c, /i 

initialize A„[l] = Op^xT, O[0] = Oxxp, and L„[l], Q„[l] at random. 

for k = 1,2,... do 

Receive {Qm[fc]} from neighbors m E Jn 

[SI] Update local dual variables: 

0„[fc] = 0„[fc - 1] + yuE,neJ„(Qn[fc] - QmW) 

[S2] Update first group of local primal variables: 

Q„[fc+1] = {Y;L„[fc] - KW^M - 0„[fc] + cE„,ej„(QnW + Q„Jfc])} [L;[A:]L„[fc] + (A,/iV + 2c\J^\)\p 

[S2] Update second group of local primal variables: 

L„[fc + 1] = (Y„ - A„[fc]) Q„[fc + 1] [Q'Jfc + l]Q„[fc + 1] + AJp]-' 

[S3] Update third group of local primal variables: 

A„[fc + 1] = 5a, (Y„ - L„[fc + l]Q;[fc + 1]) 

Broadcast {Q„[fc + 1]} to neighbors m e Jn 
end for 
return A„,Q„,L„ 



which exploits the low-rank property of X through nuclear-norm regularization. Estimator ([19] was 
proposed in Q, and solved centrally whereby all data V<^^{Yr}) is available to feed an e.g., off-the- 
shelf semidefinite programming (SDP) solver. The general estimator in (PI) reduces to (IT9] l upon setting 
R- = OlxF and Ai = 0. Hence, it is possible to derive a distributed algorithm for low-rank matrix 
completion by specializing Algorithm [T] to the setting here. 

Before dwelling into the algorithmic details, a brief parenthesis is in order to touch upon properties 
of local sampling operators. Operator "Pq^ is a linear orthogonal projector, since it projects its matrix 
argument onto the subspace ^n '■= {Z G M^"^^ : supp(Z) € il„} of matrices with support contained 
in il„. Linearity of Vq^ implies that vec('Pn^(Z)) = Af7^^vec(Z), where Aq^ € M^"^^ is a symmetric 
and idempotent projection matrix that will prove handy later on. To characterize Aq^^, introduce an 
Ln X T masking matrix Cln whose (/,t)-th entry equals one when {l,t) € 0„, and zero otherwise. Since 
Vq^{Z) = Cln Z, from standard properties of the vec(-) operator it follows that An„ = diag(vec(ri„)). 
Distributed Matrix Completion (DMC) Algorithm. Going back to the general distributed formulation 
in (P4), since there is no sparse component A in the matrix completion problem (|T9] l. the only constraints 
that remain are Q„ = Qm, rn G J'n, n G M. These correspond to the dual variables On[k] per agent, and 
are updated as shown in Algorithm |4l 

In the absence of {An}neAf and the auxiliary variables {BnjnGAf' it suffices to cycle over two groups 
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of primal variables to arrive at the DMC iterations. The primal variable updates can be readily obtained by 
capitalizing on the properties of the vec() operator In particular, Algorithm (Tjindicates that the recursions 
for Q„ are given by [let q := vec(Q')] 

Qn[k + 1] = a.rgmhJ^\\rnjYr, - Ln[k]Q')\\l + ^IIQIIf 

Q Qn[k] + Q.m[k] ' 



+(0„[fc],Q) + c J2 

unvecf argmin<^ -||An,.vec(Y„) - Af7„(I L„[A;])qf + — ^||q 

vec{Qn'[k] + Qj[k]) ' 



(vec(0„'[fc]),q) + c '^ 



mej„ 



2 
Likewise, L„ is updated by solving the following subproblem per iteration (let 1 := vec(L)) 



(20) 



L„[fc + 1] = argmin|l||Pf,„(Y„ -LQ:,[A;+ IDIII, + ^||L| 



unvec { argmin<| i||Af,„vec(Y„) - Af,„(Q„[fc + 1] I^Jlf + ^||lf } ] . (21) 



Both (1201 ) and (1211) are unconstrained convex quadratic problems, which admit the closed-form solutions 
tabulated under Algorithm S] 

The per-agent computational complexity of the DMC algorithm is dominated by repeated inversions of 
p X p and pLn x pLn matrices to obtain E„[A; + 1] and D.„[A; + 1], respectively (see Algorithm |4]). Notice 
that E„[/c + 1] G WP'^^P'^ has block-diagonal structure with blocks of size px p. Inversion oi px p matrices 
is affordable in practice since p is typically small for a number of applications of interest (cf. the low-rank 
assumption). In addition, L„ is the number of row vectors acquired per agent which can be controlled by 
the designer to accommodate a prescribed maximum computational complexity. On a per iteration basis, 
network agents communicate their updated local estimates Qn[A:] only with their neighbors, in order to 
carry out the updates of primal and dual variables during the next iteration. In terms of communication 
cost, Qn[/i;] is a T X p matrix and its transmission does not incur significant overhead for small values 
of p. Observe that the dual variables 0„[/c] need not be exchanged, and the overall communication cost 
does not depend on the network size A^. 

D. Distributed sparse linear regression for spectrum cartography 

In a classical linear regression setting, training data {Y, R} related through Y = RA + V are given 
and the goal is to estimate the regression coefficient matrix A based on the LS criterion^. However, LS 

'Even though vector responses and model parameters are typically considered, i.e., y — Ra + v, the matrix notation is retained 
here for consistency. 
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Algorithm 4 : DMC algorithm per agent n G Af 



Input Y„,Ao„,A*,c, /i 

Initialize O[0] = Oxxp, and L„[l], Q„[l] at random 

for k = 1,2,... do 

Receive {Qm[fc]} from neighbors m E Jn 

[SI] Update local dual variables: 

0„[fc] = 0„[fc - 1] + yuE,neJ„(Qn[fc] - QmW) 

[S2] Update first group of local primal variables: 

E„[A: + 1] = {(It ® L;[A:])Ao„(It ® L„[fc]) + (A, /TV + 2c\J^\)\,tY^ 

Q;jfc+1] = unvec (E„[fc + 1] {(It ® L;[fc])Ao„vec(Y„) - vec(0;[fc]) + cvec(E™ejJQ;^W + Q;„W))}) 

[S3] Update second group of local primal variables: 

D„[A: + 1] = {(Q;jfc + 1] ® lLjAo„(Q„[fc + 1] ® I^J + A,IpL„}"' 

L„[fc + 1] = unvec (D„[fc + 1] (QJJfc + 1] ® I^ JAn„vec(Y„)) 

Broadcast {Qn[fc + 1]} to neighbors m e >/„ 
end for 
Return Q„,L„ 



often yields unsatisfactory prediction accuracy and fails to provide parsimonious estimates; see e.g., ifTSl . 
To deal with such limitations, one can adopt a regularization technique known as Lasso ||3T| . 

The general framework in this paper can accommodate distributed estimators of the regression coeffi- 
cients via Lasso, when the training data are scattered across different agents, and their communication to 
a central processing unit is prohibited for e.g., communication cost or privacy reasons. With reference to 
the setup described in Section JIl each agent has available the training data {Y„,R„} and wishes to find 

^ ri Ai 1 

Alusso := argminV" -||Y„, - R„A||| + — |!A||i (22) 

A -^ — ' I iV 

n=l L -I 

in a distributed fashion. The Lasso estimator (|22] | is a special case of (PI) in the absence of a low -rank 
component; that is when X = OlxT> and il = {1, . . . , L} x {1, . . . , T}. 

An application domain where this problem arises is spectrum sensing in cognitive radio (CR) networks 
whereby sensing CRs collaborate to estimate the radio-frequency power spectrum density maps <I>(x, /) 
across space x G M^ and frequency /; see lH, ll22l . These maps enable identification of opportunisti- 
cally available spectrum bands for re-use and handoff operation; as well as localization, transmit-power 
estimation, and tracking of primary user activities. 

A cooperative approach to spectrum cartography is introduced in ||4], based on a basis expansion model 
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of $(x, /). Spatially-distributed CRs collect smoothed periodogram samples Y„ of the received signal at 
given sampling frequencies, based on which they want to determine the unknown expansion coefficients in 
A. The sensing scheme capitalizes on two forms of sparsity: the first one introduced by the narrow-band 
nature of transmit-PSDs relative to the broad swaths of usable spectrum; and the second one emerging 
from sparsely located active radios in the operational space. All in all, locating the active transmitters boils 
down to a variable selection problem, which motivates well employment of the Lasso in (l22l ). Since data 
are collected by cooperating CRs at different locations, estimation of A amounts to solving a distributed 
parameter estimation problem which demands taking into account the network topology, and devising a 
protocol to share the data. All these are accomplished by the algorithm outlined next. 
Distributed Lasso (DLasso) Algorithm Il22l . In the Lasso setting here, (P4) reduces to 



mill > 

{A„,B4^^ 



II V _P} Pi l|2 _L '^lll A II 
„ II ■•■ n ^^n^-'n\\p I ^^||-^^n||l 



n=l L 

s. to B„ = A„, n ^ J\f 

An = Am, m e Jn, n e M 



which is a convex optimization problem equivalent to (|22| |. Accordingly, the DLasso algorithm (tabulated 
under Algorithm |5]l is readily obtained from Algorithm [1] after retaining only the update recursions for 
{A„,B„} and the multipliers {M„,P„}. A distributed protocol to select Ai via A'-fold cross valida- 
tion |[T5l Ch. 7] is also available in [22], along with numerical tests to assess its performance. 

Since Lasso in (l22l) and its separable constrained reformulation are convex optimization problems, the 
general convergence results for AD-MoM iterations can be invoked to establish convergence of DLasso 
as well. A detailed proof of the following proposition can be found in |[22l App. E]. 

Proposition 3: Under (al) and for any value of the penalty coefficient c > 0, the iterates A„[A;] converge 
to the Lasso solution [cf (1221 )/ as k ^ oo, i.e., limfc_!.oo A„[A;] = Ai„sso, Vn € Af. 

V. Numerical Tests 

This section corroborates convergence and gauges performance of the proposed algorithms, when tested 
on the applications of Section |IV] using synthetic and real network data. 

Synthetic network data. A network of iV = 20 agents is considered as a realization of the random 
geometric graph model, that is, agents are randomly placed on the unit square and two agents communicate 
with each other if their Euclidean distance is less than a prescribed communication range of 0.35; see Fig. [T] 
The network graph is bidirectional and comprises L = 106 links, and F = N{N — 1) = 380 OD flows. 
The entries of V are independent and identically distributed (i.i.d.), zero-mean, Gaussian with variance 
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Algorithm 5 : DLasso per agent n & M 



input Y„,R„,Ai,c, ^ 

initialize M„[0] ^ P„[0] = A„[l] ^ B„[l] = O^xt 

for k = 1,2,... do 

Receive {Am[^"]} from neighbors m E Jn 

[SI] Update local primal variables: 

M„[fc] = M„[fc - 1] + c(B„[fc] - A„[fc]) 

V,\k\ = P„[fc - 1] + cE„,ej„(A„[fc] - A„,[fc]) 

[S2] Update primal variables: 

A„[fc + 1] = [c(l + 2|J-„|)]-i5A,/Ar (M„[fc] + cB„[fc] - P„[fc] + cE™6j„(A„[fc] + KJ,k\)) 

[S3] Update auxiliary local primal variables: 

B„[fc + 1] = [R;R„ + cIf]-i {R;Y„ - M„[fc] + cA„[A: + 1]} 

Broadcast A„[/c + 1] to neighbors m G >/„ 
end for 
return A,, 



o"^; i.e., vji ~ A^(0, (T^). Low-rank matrices with rank r are generated from the biUnear factorization 
model Xq = WZ', where W and Z are L x r and T y.r matrices with i.i.d. entries drawn from Gaussian 
distributions A^(0, 100/F) and A^(0, 100/T), respectively. Every entry of Aq is randomly drawn from the 
set {—1,0,1} with Pr(ajj = —1) = Pr(aij = 1) = 7r/2. Unless otherwise stated, r = 3, p = 3 and 
T = F = 380 are used throughout. Different values of a, and vr are examined. 

Internet! network data. Real data including OD flow traffic levels and end-to-end latencies are collected 
from the operation of the Internet2 network (Internet backbone network across USA) [Ij. Both versions 
of the Internet! network, referred as vl and v2, are considered. OD flow traffic levels are recorded for a 
three-week operation of Intemet2-vl during Dec. 8-28, 2008 ||T9|, and are used to assess performance of 
DUNA and DRPCA (see Sections IV-Al and |V-B| next). Internet2-vl contains iV = 11 agents, L = 41 links, 
and F = 121 flows. To test the DMC algorithm, end-to-end flow latencies are collected from the operation 
of Intemet2-v2 during Aug. 18-22, 2011 Q. The Intemet2-v2 network comprises A^ = 9 agents, L = 26 
links, and F = 81 flows. 

Selection of tuning parameters. The sparsity- and rank-controlling parameters Ai and A^, are tuned to 
optimize performance. The optimality conditions for (PI) indicate that for Ai > ||R'Y||oo and A* > ||Y||, 
{Xq = OlxTi Aq = Ofxt} is the unique optimal solution. This in turn confines the search space for 
Ai and A* to the intervals (0, ||R'Y||oo] and (0, ||Y||], respectively. In addition, for the case of matrix 
completion and robust PCA one can use the heuristic rules proposed in e.g., ||9l and lUl. 
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A. Unveiling network anomalies 

Data is generated from Y = R(Xo + Ao) + V, where the routing matrix R is obtained after determining 
shortest-path routes of the OD flows. For fi = c = 0.1, DUNA is run until convergence is attained. These 
values were experimentally chosen to obtain the fastest convergence rate. The time evolution of consensus 
among agents is depicted in Fig. [2] (left), for representative agents in the network. The metric of interest 
here is the relative error ||Qn[fe] — Q[/i;]||i7'/||Q[A;]||i? per agent n, which compares the corresponding local 
estimate with the network-wide average Q,[k] := -^ ^n=i Qn[k]', and likewise for the A„[A;]. Fig. |2](left) 
shows that DUNA converges and agents consent on the global matrices {Q, A} as A; — )• oo. 

To corroborate that DUNA attains the centralized performance, the accelerated proximal gradient al- 
gorithm of ETI is employed to solve (PI) after collecting all the per-agent data in a central processing 
unit. For both the distributed and centralized schemes. Fig. |2] (right) depicts the evolution of the relative 
error ||A[A;] — Ao||f/||Ao||f for various sparsity levels, where A.[k] := A[k] for DUNA. It is apparent 
that the distributed estimator approaches the performance of its centralized counterpart, thus corroborating 
convergence and global optimality as per Proposition |2] 

Unveiling Internet2-vl network anomalies from SNMP measurements. Given the OD flow traffic mea- 
surements discussed at the beginning of Section JV] the link loads in Y are obtained through multiplication 
with the Internet2-vl routing matrix lH]. Even though Y is "constructed" here from flow measurements, 
link loads can be typically acquired from simple network management protocol (SNMP) traces |30l. The 
available OD flows are a superposition of "clean" and anomalous traffic, i.e., the sum of unknown "ground- 
truth" low-rank and a sparse matrices Xq + Aq adhering to (|T6l ) when R = I^?. Therefore, the proposed 
algorithms are applied first to obtain a reasonably precise estimate of the "ground-truth" {Xq, Aq}. The 
estimated Xq exhibits three dominant singular values, confirming the low-rank property of Xq. 

The receiver operation characteristic (ROC) curves in Fig. |3] (left) highlight the merits of DUNA when 
used to identify Internet2-vl network anomalies. Even at low false alarm rates of e.g., PpA = 0.04, the 
anomalies are accurately detected (Pd = 0.93). In addition, DUNA consistently outperforms the landmark 
PCA-based method of |[T9l . and can identify anomalous flows. Fig. [3] (right) illustrates the magnitude of 
the true and estimated anomalies across flows and time. 

B. Robust PCA 

Next, the convergence and effectiveness of the proposed DRPCA algorithm is corroborated with the aid 
of computer simulations. An F x T data matrix is generated as Y = Xq + Aq + V, and the centralized 
estimator (ITST i is obtained using the AD-MoM method proposed in HI. In the network setting, each agent 
has available L„ = 19 rows of Y. Fig. |2] (right) is replicated as Fig. |4] (left) for the robust PCA problem 
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dealt with here, and for different values of p [the assumed upper bound on rank(Xo)]. It is again apparent 
that DRPCA converges and approaches the performance of (ITST i as A; ^ cx). 

Unveiling Internet2-vl network anomalies from Netflow measurements. Suppose a router n e Af 
monitors the traffic volume of OD flows to unveil anomalies using e.g., the Netflow protocol ||2l, ll30l . 
Collect the time-series of all OD flows as the rows of the F xT matrix Y = Xq + Aq + V, where Aq and 
V account for anomalies and noise, respectively. As elaborated in Section IIV-AI the common temporal 
patterns across flows renders the traffic matrix Xq low-rank. Owing to the difficulties of measuring the 
large number of OD flows, in practice only a few entries of Y are typically available ll30l . or, link traffic 
measurements are utilized as in Section IV- A [ (recall that L ^ F). In this example, because the Internet2-vl 
network data comprises only F = 121 flows, it is assumed that il = {1, ...,-F} x {1, ...,T}. 

To better assess performance, large spikes are injected into 1% randomly selected entries of the ground 
truth-traffic matrix Xq estimated in Section [V-AI The DRPCA algorithm is run on this Intemet2-vl Netflow 
data to identify the anomalies. The results are depicted in Fig. |4] (right). DRPCA accurately identifies the 
anomaUes, achieving Pq = 0.98 when PpA = 10~'^. 

C. Low-rank matrix completion 

In addition to the synthetic data specifications outlined at the beginning of this section, the sampling 
set Q. is picked uniformly at random, where each entry of the matrix fi is a Bernoulli random variable 
taking the value one with probability p. Data for the matrix completion problem is thus generated as 
Vn{Y) = Pj^(Xo + V) = fi (Xo + V), where Y is an L x T matrix with L = T = 106. The data 
available to agent n is 'Pf7^(Y„), which corresponds to a row subset of 'Pn{Y). 

As with the previous test cases, it is shown first that the DMC algorithm converges to the (centralized) 
solution of ( fT9l ). To this end, the centralized singular value thresholding algorithm is used to solve (|T9] l ifTOl . 
when all data 'Pn{Y) is available for processing. For both the distributed and centralized schemes. Fig. |5] 
(left) depicts the evolution of the relative error ||X[A;] — Xo||f/||Xo||_p for different values of a (noise 
strength), and percentage of missing entries (controlled by p). Regardless of the values of a and p, the 
error trends clearly show the convergent behavior of the DMC algorithm and corroborate Proposition 
Interestingly, for small noise levels where the estimation error approaches zero, the distributed estimator 
recovers Xq almost exactly. 

Internet2-v2 network latency prediction. End-to-end network latency information is critical towards 
enforcing quality-of-service constraints in many Internet applications. However, probing all pairwise delays 
becomes infeasible in large-scale networks. If one collects the end-to-end latencies of source-sink pairs 
{i,j) in a delay matrix X := [xij] G M^^^, strong dependencies among path delays render X low- 
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rank lEOl . This is mainly because the paths with nearby end nodes often overlap and share common 
bottleneck links. This property of X along with the distributed-processing requirements of large-scale 
networks, motivates well the adoption of the DMC algorithm for networkwide path latency prediction. 
Given the n-th row of X is partially available to agent n, the goal is to impute the missing delays through 
agent collaboration. 

The DMC algorithm is tested here using the real path latency data collected from the operation of 
Intemet2-v2. Spectral analysis of Xq reveals that the first four singular values are markedly dominant, 
demonstrating that Xq is low rank. A fraction of the entries in Xq are purposely dropped to yield an 
incomplete delay matrix 'Pn{X.Q). After running the DMC algorithm till convergence, the true and predicted 
latencies are depicted in Fig. |5] (right) (for 20% missing data). The relative prediction error is around 10%. 

VI. Concluding Summary 

A framework for distributed sparsity -regularized rank minimization is developed in this paper, that 
is suitable for (un)supervised inference tasks in networks. Fundamental problems such as in-network 
compressed sensing, matrix completion, and principal components pursuit are all captured under the same 
umbrella. The novel distributed algorithms can be utilized to unveil traffic volume anomalies from SNMP 
and Netflow traces, to predict networkwide path latencies in large-scale IP networks, and to map-out the 
RF ambiance using periodogram samples collected by spatially-distributed cognitive radios. 

Appendix 
A. Proof of Proposition [7] Recall the cost function of (P3) defined as 

/(L, Q, A) := ^\\Vn{Y - LQ' - RA)||2, + ^ (||L||| + ||Q|||) + Ai||A||i. (23) 

The stationary points {L, Q, A} of (P3) are obtained by setting to zero the (sub)gradients, and solving Q 

dAf{t, Q, A) = R'Pn(Y - LQ' - RA) - Aisign(A) = Oi^xT (24) 

Vl/(L, Q, A) = Po(Y - LQ' - RA)Q - A,L = O^xp (25) 

Vq,/(L, Q, A) = L'VniY - LQ' - RA) - A,Q' = O^xT- (26) 

Clearly, every stationary point satisfies Vl/(L,Q, A)L' = OlxL and QVq'/(L,Q, A) = OtxT- Using 
the aforementioned identities, the optimality conditions (I24l)-(l26l) can be rewritten as 

R'7'n(Y - LQ' - RA) = Aisign(A) (27) 

tr{rn{Y - LQ' - RA)QL') = A*ti-{QQ'} = A,tr{LL'}. (28) 
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Consider now the following convex optimization problem 



(P5) mill 

{X,A,Wi,W2} 



^llT'nlY - X - RA)||2, + ^ {tr{Wi} + trjWa}} + Ai||A||i 



, Wi X , 
s. to W := I I >i (29) 

X' W2 

which is equivalent to (PI). The equivalence can be readily inferred by minimizing (P5) with respect to 
{Wi, W2} first, and taking advantage of the following alternative characterization of the nuclear norm 
(see e.g., lH) 

1 / Wi X \ 

||X||, = mill -{tr(Wi)+tr(W2)}, s. to >r 0. 

{Wi,w.} 2 ^ ' ' I X' W2 / 

In what follows, the optimality conditions for the conic program (P5) are explored. To this end, the 
Lagrangian is first formed as 

/:(X,Wi,W2,A,M) = ^||Pn(Y-X-RA)||| + ^{tr(Wi)+tr(W2)}-(M,W) + ||A||i 

where M denotes the dual variables associated with the conic constraint (l29l ). For notational convenience, 
partition M in four blocks Mi := [M]ii, M2 := [M]i2, M3 := [M]22, and M4 := [M]2i, in accordance 
with the block structure of W in (l29l) . where Mi and M3 are L x L and T x T matrices. The optimal 
solution to (P5) must: (i) null the (sub)gradients 

Vx/:(X, Wi, W2, A,M) = -Vn(y-X- RA) - M2 - M4' (30) 

5A/:(X,Wi,W2,A,M) = -R'Pn(Y-X-RA)-Aisign(A) (31) 

Vw,/:(X,Wi,W2,A,M) = yIi-Mi (32) 

Vw./:(X,Wi,W2,A,M) = yIr-M3 (33) 

and satisfy (ii) the complementary slackness condition (M, W) = 0; (iii) primal feasibility W ^ 0; and 
(iv) dual feasibility M ^ 0. 

Recall the stationary point of (P3), and introduce candidate primal variables X := LQ', A := A, 
Wi := LL' and W2 := QQ'; and the dual variables Mi := ^I^, M3 := ^Ir, M2 := -(l/2)Pn(Y- 
LQ' — RA), and M4 := M2. Then, (i) holds since after plugging the candidate primal and dual variables 
in ([30t-(l33l). the subgradients vanish. Moreover, (ii) holds since 



(M,W) = (Ml, Wi) + (M2,X) + (M'2,X') + (M3,W2) 

= y (Il,LL') + ^(Lr,QQ') - {VniY - LQ' - RA),LQ') 

- '^*llf l|2 _L ^*||0||2 X llf l|2 - n 
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where the last two equahties follow from (I28l l. Condition (iii) is also met since 

LL' LQ' \ _ / L W L \ 
QL' QQ' ) ~ \Q ) \Q ) 

To satisfy (iv), based on a Schur complement argument |[T6l it suffices to enforce o"niax(M2) < A*. ■ 
B. Derivation of Algorithm |7] It is shown here that [S1]-[S4] in Section UlI-CI give rise to the set of 
recursions tabulated under Algorithm [T] To this end, recall the augmented Lagrangian function in Q and 
focus first on [S4]. From the decomposable structure of Cc, CSl) decouples into simpler strictly convex 
sub-problems for n G A/" and m ^ Jn, namely 

Bn[k + 1] = argmin {r„(L„[A: + 1], Q„[A: + 1],B„) + (M„[A;],B„) + |||B„ - A„[A: + 1]||2,} (35) 

¥^[k + 1] = arg mill {| (||Q„[fc + 1] - F-||| + ||Q^[fc + 1] - F™|||) - (C™[fe] + C^[klY^)} 

(36) 
G-[A: + 1] = arg min {| (||A„[fc + 1] - G^\\l + ||A„,[A: + 1] - G^\\l) - (D™[A:] + D™[^], G™)} . 

(37) 

Note that in formulating (l36l ) and (|37] ). the auxiliary variables F™ and G™ were eliminated using the 
constraints F™ = F^ and G^ = G™, respectively. The unconstrained quadratic problems (136] and (l37l) 
admit the closed-form solutions 

F-[fc + 1] = F™[A; + 1] = ^(C™[A;] + C-[A;]) + ^ (Q„[/c + 1] + Q„[^ + 1]) (38) 

G^[k + 1] = G™[A; + 1] = ^(D-[fc] + D-[fc]) + i (A„[A; + 1] + A„[A; + 1]) . (39) 

Using ( [38] ) to eliminate F™[A;] and F™[A;] from dSJ and @ respectively, a simple induction argument 
establishes that if the initial Lagrange multipliers obey C™[0] = -C^[0] = O^xp, then C™[A;] = -C™[A;] 
for all A; > 0, where n G M and ?n G J7„. Likewise, the same holds true for D™[A;] and D^[/c]. The 
collection of multipHers {C^[A:],D^[A:]}^'|^" is thus redundant, and dMJ-dHJ simpUfy to 

F^[k + 1] = F^'[k +M = l {Qn[k + 1] + Qrn[k + 1]) , UGM, 771 G Jn (40) 

G;r[A; + 1] = G™[A: + 1] = ^ (A„[A: + 1] + A„[A: + 1]) , n G AA, m G J"™. (41) 

Observe that F'^^ik] = F;^[A:] and Gn[k] = G^J/c] for all k > 0, identities that will be used later on. By 
plugging ( |40l ) and (|4TI ) into ([8]l and (ITOl ) respectively, the non-redundant multiplier updates become 

C^ [k] = C™ [k - 1] + ^ (Q„ [k] - Q^ [k] ) , n G AA, m G J-n (42) 

D^[A;] = D;r[fc-l] + ^(A„[^i-A„[A:]), n G AT, 7n G J-„. (43) 
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If C;^ [0] = -C;^[0] = Otxp, then the structure of (gljl reveals that C™[A:] = -C^^k] for all fc > 0, 
where n £ Af and m £ Jn- Clearly, the same holds true for DJ^[/i;], and these identities will become 
handy in the sequel. 

Moving on to [S3], (fT3l) decouples into \M\ unconstrained quadratic sub-problems 

l^n[k + 1] = argrnin j r„(L„,Q„[/c + 1],B„[A;]) + y||L„|||^ \ . 

The minimization (fT2l ) in [S2] also decomposes into simpler sub-problems, both across agents and across 
the variables {QnjneA^ and {A.n\neN^ which are decoupled in the augmented Lagrangian when all other 
variables are fixed. Specifically, the per agent updates of Q„ are given by 

Q„[A: + l]=argmin|r„(L„[A:],Q„,B„[A:]) + ^||Q„,|||+ J^ (C^TW + C^^i^], Qn) 

I mGj"„ 

+1 E (llQn-F-[fe]|||. + iiQ„-F;;[fc]|i^; 



mej„ 



A* 



argmin<| r„(L„[fc], Qn, B„[A;]) + ^||Qn||F + (On[A:],Q„) 






Qr 



Qn[k]+Qm[k] 



(44) 



where the second equality was obtained after using: i) C^[/c] = Cm[^] which follows from the identities 
C™[fe] = -C™[A;] and C^f [fc] = -C;^[A:] established earlier; ii) the definition 0„(A:) ■.= 2J2m€j„ C^TfA:]; 
and iii) the identity F^[fc] = F^[A;], which allows to merge the identical quadratic penalty terms and 
eliminate both F;^^[A;] and F^^lk] using ^. 

Upon defining Pn{k) ■= 2 X]mGj„ ^rTI^] ^^id following similar steps as the ones that led to the second 
equality in (l44l) . one arrives at 

{Mn[k],An) + ^\\Bn[k] - A„||^ + (P„[fc], A„) 



An[k+ 1] = argmin<; ^||A„||i 
A„ I iV 



+^E 

arg mm < — 
A„ I 2 



An[k]+Am[k] 



Mn[k] + cBn[k] - Pn[k] + cJ2meJ„ (^nW + A, 



cil + 2\Jn\ 



+ Ai||A„,||i 



(45) 



where the second equality follows by completing the squares. Problem (1451 ) is a separable instance of the 
Lasso (also related to the proximal operator of the ^i-norm); hence, its solution is expressible in terms of 
the soft-thresholding operator as in Algorithm [T] 
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C. Proof of Proposition ^ Let Q„ := linifc^oo Qn[^]. and likewise for all other convergent sequences in 
Algorithm [T] Examination of the recursion for On[fc] in the limit as fc — > oo, reveals that 'Y^mej IQ" ~ 
Qm] = Otxp, Vn € TV. Upon vectorizing the matrix quantities involved, this system of equations implies 
that the supervector q := [vec[Qi]', . . . ,vec[QAr]']' belongs to the nuUspace of L (g> Irp. where L is the 
Laplacian of the network graph G{N , C). Under (al), this guarantees that Qi = Q2 = . . . = Qat. From 
the analysis of the limiting behavior of Pn[A;], the same argument leads to Ai = A2 = . . . = A^r, which 
establishes the consensus results in the statement of Proposition |2] Hence, one can go ahead and define 
Q := Q,j and A := A„. Before moving on, note that convergence of M„[A;] implies that B„ = A„ = A, 
n £ M. These observations guarantee that the limiting solution is feasible for (P4). 

To prove the optimality claim it suffices to show that upon convergence, the fixed point {L, Q, A, B} of 
the iterations comprising Algorithm [T] satisfies the Karush-Kuhn-Tucker (KKT) optimality conditions for 
(P4). Proposition [1] asserts that if H'PnCY — LQ' — RA)|| < A*, {L, Q, A} is indeed an optimal solution 
to (PI). To this end, consider the updates of the primal variables in Algorithm [T] which satisfy 

VQ„r„(Q,[fc + l],Ln[k],Bn[k]) + ^Q„[A; + 1] + 0„[A: + 1] 






(46) 



VL„r„(Q„[A: + 1],L„[A; + l],B„[fc]) + A*L„[A: + 1] = O^xp (47) 

VB„r„(Q„[/c + l],Ln[k + l],Bn[k + 1]) + M„[A;] + c(B„[fc + 1] - A„[A; + 1]) = O^xT- (48) 
Taking the limit from both sides of (|461)-(|48]). and summing up over all n € TV yields 

VQr(Q,L,A) + A,Q = OTxp (49) 

VLr(Q,L,A) + A*L = OLxp (50) 

VBr(Q, L,A)+^Mn = OfxT (51) 

where r(L,Q,B) := |||Pn(Y-LQ'-RB)|||,. To arrive at dHJ, the assumption that C;^[l] = 0, V?n G 
Jr^neMis used, and thus C-[fc] = -C^[k] which leads to ZneM On[k] = EneJ^ T.m.ej^ ^n[k] = 0. 
Next, consider the auxiliary matrices 0„ := M„ — P„ + c(l + 2jj7n|)A, n G TV. In the limit as A; — ;• 00, 
the update recursion for A„[A; + 1] in Algorithm [1] can be written as c(l + 2jj7n|)A = 5(0„,Ai/A^) . 
Proceed by defining *„ := 0„ — c(l + 2\Jn\)A, and observe that the input-output relationship of the 
soft-thresholding operator S yields 

Ai/iV, [A]/,i > 0, 

[*n]/,t = <^ -Xi/N, [A]/,t < 0, (52) 
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Given dSH), define Ae matrices Ti := i (AiIfI^ + E^=i *n) and T2 := ^ (AiIj^I^ - ^^^^ *„), 
and siiow tiiat tiiey satisfy the following properi;ies: (i) ri,r2 > (entrywise); (ii) [ri]/-t = 0, if 
[A]f^t < 0; (iii) [T2]f,t = 0, if [A]f^t > 0; (iv) Ti + Ts = Ail^.!^; and (v) Ti - T2 = ZneM^n- 
Properties (i)-(iii) follow after adding up the result in (l52l ) for n = 1,2, ... ,N. Property (iv) is readily 
checked from the definitions of Fi and r2. Finally, (v) follows since 

N N N N N 

T,-T2 = J2^n = ^{@n- C(l + 2\Jn\)A) = J] M„ - ^ P„ = J] M„ 
71=1 n=l n=l n=l n=l 

where X]n=i -^" ~ ^ (from the identity J2n=i ^n[k] = 0) is used to obtain the last equality. 

The proof is concluded by noticing that properties (i)-(v) along with (|49l)-([5n) comprise the KKT 
conditions for the following optimization problem 

sr i??^^ IW^^i^ - LQ' - RA)||^ + ^ { IlLllI + ||Q||2,} + Ail^TlT 

s. to - T < A < T (entrywise) 

where {L, Q, A} and {ri,r2} play the role of the optimal primal and dual variables, respectively. This 
last problem is clearly equivalent to (P4). ■ 
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Fig. 1. A network of A^ = 20 agents. 
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Fig. 2. Performance of DUNA. (left) Relative consensus error for representative network agents with a — 0.01 and n = 0.01. 
(right) Relative estimation error for distributed and centralized algorithms under various sparsity levels. 
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Fig. 3. Unveiling anomalies from Intemet2-vl SNMP data, (left) ROC curves of the proposed versus the PCA-based method, 
(right) Amplitude of the true and estimated anomalies for p — 5, Pfa ~ 0.04 and Pd = 0.93. 
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Fig. 4. Performance of DRPCA. (left) Relative estimation error for distributed and centralized algorithms under different p. 
(right) Amplitude of true and estimated anomalies using Intemet2-vl network data when p = 5, Pfa = 10^'' and Pd 
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Fig. 5. Performance of DMC. (left) Relative estimation error for distributed and centralized algorithms under various noise 
strengths and percentage of missing entries, (right) Predicted and true ene-to-end delays of Intemet2-v2 network for p = 0.2. 



